Using Group Membership Markers for Group Identification in Web Logs

نویسندگان

  • Jean Mark Gawron
  • Dipak Gupta
  • Kellen Stephens
  • Ming-Hsiang Tsou
  • Brian Spitzberg
  • Li An
چکیده

We describe a system for automatically ranking documents by degree of militancy, designed as a tool both for finding militant websites and prioritizing the data found. Our ranking system employs a small hand-selected vocabulary based on group membership markers used by insiders to identify members and member properties (us) and outsiders and threats (them). We use the same vocabulary to build a classifier. Evaluating several ranking systems by their correlations with human judgments, we show that the best ranker uses the small us-them vocabulary, outperforming one system with a much larger vocabulary, and another with a small vocabulary chosen by Mutual Information. We confirm and extend recent results in sentiment analysis (Paltoglou and Thelwall 2010), showing that a featureweighting scheme taken from classical IR (TFIDF) produces the best ranking system; we also find, surprisingly, that adjusting these weights with SVM training, while producing a better classifier, produces a worse ranker. Increasing vocabulary size similarly improves classification (while worsening ranking). Finally, we experiment with adding usage models to both systems, models of how well each word’s syntactic usage pattern matches the usage pattern in a class model; this model does not benefit ranking, but increases the precision of the classifier. Our work complements and extends previous work tracking radical groups on the web (Chen 2007; Zhou et al. 2007; Burris, Smith, and Strahm 2000), which classified such sites with heterogeneous indicators, including document, vocabulary, and morphological features. The method combines elements of linguistics, machine learning, and behavioral science, and in principle can be extended to data collection aimed at any group organized for collective action.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مقایسه وبلاگ های کتابخانه ها و کتابداران ایرانی با وبلاگ های برتر کتابداری؛1385

Introduction: Web logs are the evident tools for the librarians. There are three main ways for applying web logs in librarianship fields, as follows: personal use by librarian to upgrade their personal information, as a source of information in case of libraries, and for their services. The aim of this research is to comparison between Iranian libraries and librarians, and superior librarianshi...

متن کامل

Using Group Membership Markers for Group Identification

We describe a system for automatically ranking documents by degree of militancy, designed as a tool both for finding militant websites and prioritizing the data found. We compare three ranking systems, one employing a small hand-selected vocabulary based on group membership markers used by insiders to identify members and member properties (us) and outsiders and threats (them), one with a much ...

متن کامل

User Interest Level Based Preprocessing Algorithms Using Web Usage Mining

Web logs take an important role to know about user behavior. Several pattern mining techniques were developed to understand the user behavior. A specific kind of preprocessing technique improves the quality and accuracy of the pattern mining algorithms. The existing algorithms have done the preprocessing activities for reducing the size of the log file and to identify the number of unique users...

متن کامل

Optimizing Membership Functions using Learning Automata for Fuzzy Association Rule Mining

The Transactions in web data often consist of quantitative data, suggesting that fuzzy set theory can be used to represent such data. The time spent by users on each web page is one type of web data, was regarded as a trapezoidal membership function (TMF) and can be used to evaluate user browsing behavior. The quality of mining fuzzy association rules depends on membership functions and since t...

متن کامل

When objective group membership and subjective ethnic identification don’t align: How identification shapes intergroup bias through self-enhancement and perceived threat

When objective group membership and subjective ethnic identification don’t align, which has a greater impact on how people feel towards the groups they affiliate with, and why? Deprived of many distinctiveness markers typically found in intergroup relations (e.g., physical features, obvious status differences), Taiwanese society provides a perfect natural context to explore the impact of object...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012